All Questions
Tagged with scikit-learnfeature-selection
87 questions
1vote
0answers
19views
How to correctly use RFECV for feature selection in a Scikit-Learn pipeline with a Simple Decision Tree?
I am working on the Kaggle House Price Prediction competition and have built a Scikit-Learn pipeline that includes: Preprocessing (handling missing values, scaling, encoding) Feature Engineering ...
1vote
1answer
207views
How does a Decision Tree split when two features are tied?
Decision Trees split based on which feature and which cut-off value creates the largest mean decrease in impurity (assuming hyperparameter split="best", criterion="gini"). Now take ...
1vote
0answers
59views
sklearn - OneHotEncoding and SelectPercintile
in sklearn example there is a code ...
1vote
1answer
175views
integration of Feature Selection in Pipeline
I have noticed integrating feature selection in a pipeline alters results. Pipeline 1 gives slightly different results with pipeline 2. Why should this be so? Pipeline 2 ...
1vote
0answers
73views
How recursive feature elimination with cross validation internally works?
I am trying to understand how recursive feature elimination with cross validation works (the RFECV on sklearn). Lets say that we have 10 features, and we perform <...
1vote
1answer
853views
Does sklearn perform feature selection within cross validation?
I would like to add a feature selector on my pipeline and use gridsearchcv to tune both the hyperparameters of the selector and the classifier(s). I am wondering if sklearn performs feature selection ...
0votes
1answer
619views
How does SelectFromModel from scikit-learn select features?
When I use XGBClassifier with SelectFromModel the algorithm always returns around five features regardless of the ...
0votes
1answer
87views
Encoding Categorical feature with high cardinality - in my case IP adresses
I'm working on an intrusion detection project, I have many categorical features, for some I used label encoding since I don't have many possible values. But for IP addresses, it's a high cardinality ...
2votes
0answers
81views
Feature selection and model performance
Featuretools provides an automated way to generate features from your data, by providing relationships within your data and applying their so-called deep feature synthesis. It generates features like ...
1vote
0answers
38views
How to return selected features with different feature selection models?
I use the below function to detect the effect of those feature selection models on my data, it works perfectly. what I want is to return the name of selected features for each model, is there any ...
2votes
1answer
192views
What are the differences between the below feature selection methods?
Do the below codes do the same? If not, what are the differences? ...
1vote
0answers
165views
Using F_regression to find the best significant features
We are trying to use SelectKBest F_Regression scoring function on a pool of 1000 numerical features, and solve a regression problem. Also, we wanted to paralellize the execution of SelectKBest and we ...
2votes
1answer
2kviews
How to deal with date features in linear regression?
I need some help about a project. I have a dataframe like that; YEAR MONTH INDICATOR_1 INDICATOR_2 INDICATOR_3 2014 3 0.123 0.495 0.222 My goal is to predict all of the indicator for the next year (...
0votes
1answer
308views
How do I fine-tune model performance after the initial run? (Scikit-Learn)
I've just started learning regression using scikit-learn and stumbled upon a problem. For a given dataset, let's say that I've imputed the missing data and one-hot encoded all categorical features. ...
0votes
1answer
100views
Correlation with target variable for regression problem
Given the following dataframe age job salary 0 1 Doctor 100 1 2 Engineer 200 2 3 Lawyer 300 ... with ...